<!DOCTYPE html>

<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <meta name="viewport" content="width=device-width,initial-scale=1" />
    <title>Apache SDAP - Science Data Analytics Platform</title>
    <link rel="shortcut icon" href="/favicon.ico" />
    <link rel="icon" type="image/png" href="/favicon.png" />
    <link rel="stylesheet" href="/css/bootstrap.min.css" />
    <link rel="stylesheet" href="/css/style.css" />
  </head>
  <body>
    <div class="container">
      <div class="logos">
        <a href="/">
          <img src="https://apache.org/logos/res/sdap/sdap-1.png" class="pull-left" />
        </a>
      </div>

      <!-- navigation bar -->
      <nav class="navbar navbar-default">
        <div class="container-fluid">
          <div class="navbar-header">
            <a class="navbar-brand" href="/">SDAP</a>
          </div>
          <div class="navbar-right">
            <ul class="nav navbar-nav">
              <li class="dropdown toggle">
                <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">About SDAP <span class="caret"></span></a>
                <ul class="dropdown-menu">
                  <li><a href="/docs">Docs</a></li>
                  <li><a href="/publications">Publications</a></li>
                  <li><a href="/projects">Projects that use SDAP</a></li>
                  <li><a href="/events">Community Events</a></li>
                </ul>
              </li>
              <li><a href="/downloads">Downloads</a></li>
              <li><a href="/blog">Blog</a></li>
              <li><a href="/team">Team &amp; Community</a></li>
<!--              <li><a href="/resources">Resources</a></li>-->
              <li class="dropdown toggle">
              	<a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Apache <span class="caret"></span></a>
                <ul class="dropdown-menu">
                  <li><a href="http://www.apache.org/">Apache Software Foundation</a></li>
                  <li><a href="http://www.apache.org/licenses/">License</a></li>
                  <li><a href="http://www.apache.org/foundation/sponsorship">Sponsorship</a></li>
                  <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
                  <li><a href="http://www.apache.org/events/current-event/">Events</a></li>
                  <li><a href="http://www.apache.org/security/">Security</a></li>
                </ul>
              </li>
            </ul>
          </div>
        </div>
      </nav>


<h1>Blog</h1>



<a href="/update/2023/11/29/new-logo.html"><h2>New SDAP Logo</h2></a>
<p>Posted <b>2023-11-29</b> by <b>Stepheny Perez</b></p>
<!--

-->

<p>The SDAP community is pleased to announce our new logo!</p>

<div style="text-align: center;">
	<img src="https://apache.org/logos/res/sdap/sdap-1.png" />
	Figure 1. New SDAP Logo
</div>




<a href="/release/2023/01/20/v1.0.0-release.html"><h2>V1.0.0 Release</h2></a>
<p>Posted <b>2023-01-20</b> by <b>Riley Kuttruff</b></p>
<!--

-->

<p>The SDAP community is pleased to announce our first release of Apache SDAP (incubating), version 1.0.0!</p>

<p><a href="https://github.com/apache/incubator-sdap-nexus/blob/91fe0ec386a77c7a48b073e8268dea004ec7939e/docs/build.rst">Instructions for building Docker images from the source release</a></p>

<p><a href="https://incubator-sdap-nexus.readthedocs.io/en/latest/quickstart.html">Instructions for deploying locally to test</a></p>

<p>Release notes:</p>
<ul>
  <li><a href="https://github.com/apache/incubator-sdap-nexus/blob/bf65205c8dae838d95843de34c94d6d46a579308/CHANGELOG.md">Nexus</a></li>
  <li><a href="https://github.com/apache/incubator-sdap-ingester/blob/ae25aae70cc9b91bd0b05788e88e77edc77f0fd3/CHANGELOG.md">Ingester</a></li>
</ul>

<p>Resources:</p>
<ul>
  <li>Github:
    <ul>
      <li><a href="https://github.com/apache/incubator-sdap-nexus">Nexus</a></li>
      <li><a href="https://github.com/apache/incubator-sdap-ingester">Ingester</a></li>
      <li><a href="https://github.com/apache/incubator-sdap-nexusproto">Nexus Proto</a></li>
    </ul>
  </li>
  <li><a href="https://issues.apache.org/jira/projects/SDAP/issues">Jira</a></li>
  <li><a href="mailto:dev@sdap.apache.org">Mailing list</a></li>
</ul>




<a href="/weekly/update/2018/04/23/vocabulary-similarity-algorithm.html"><h2>An introduction to MUDROD vocabulary similarity calculation algorithm</h2></a>
<p>Posted <b>2018-04-23</b> by <b>Lewis John McGibbney</b></p>
<p>Big geospatial data have been produced, archived and made available online, but finding the right data for scientific research and decision-support applications remains a significant challenge. A long-standing problem in data discovery is how to locate, assimilate and utilize the semantic context for a given query. Most of past research in geospatial domain attempts to solve this problem through two approaches: 1) building a domain-specific ontology  manually; 2) discovering semantic relationship through dataset metadata automatically using machine learning techniques. The former contains rich expert knowledge, but it is static, costly, and labour intensive, while the latter is automatic, it is prone to noise.</p>

<p>An emerging trend in information science is to take advantage of large-scale user search history, which is dynamic but contains user and crawler generated noise. Leveraging the benefits of all of these three approaches and avoiding their weaknesses, a novel  approach is proposed in this article to 1) discover vocabulary semantic relationship from user clickstream; 2) refine the similarity calculation methods from existing ontology; 3) integrate the results of ontology, metadata, user search history and clickstream analysis to better determine the semantic relationship.</p>

<center>
	<img src="/images/vocabulary.png" />
	Figure 1. System workflow and architecture
</center>

<p>The system starts by pre-processing raw web logs, metadata, and ontology (Figure 1 ). After pre-processing step, search history and clickstream data are extracted from raw logs, selected properties are extracted from metadata, and ocean-related triples are extracted from the SWEET ontology. These four types of processed data are then put into their corresponding processer as discussed in the last section. Once all the processers finish their jobs, the results of different methods are integrated to produce a final most related terms list.</p>




<a href="/weekly/update/2018/04/23/recommendation-algorithms.html"><h2>An introduction to MUDROD recommendation algorithm</h2></a>
<p>Posted <b>2018-04-23</b> by <b>Lewis John McGibbney</b></p>
<p>With the recent advances in remote sensing satellites and other sensors, geographic datasets have been growing faster than ever. In response, a number of Spatial Data Infrastructure (SDI) components (e.g. catalogues and portals) have been developed to archive and made those datasets available online. However, finding the right data for scientific research and application development is still a challenge due to the lack of data relevancy information.</p>

<p>Recommendation has become extremely common in recent years and are utilized in a variety of areas to help users quickly find useful information. We propose a recommendation system to improve geographic data discovery by mining and utilizing metadata and usage logs. Metadata abstracts are processed with natural language processing methods to find semantic relationship between metadata. Metadata variables are used to calculate spatial and temporal similarity between metadata. In addition, portal logs are analysed to introduce user preference.</p>

<center>
	<img src="/images/recommendation.png" />
	Figure 1. Recommendation workflow
</center>

<p>The system starts by pre-processing raw web logs and metadata (Figure 1). After pre-processing step, sessions are reconstructed from raw web logs and then used to calculate session-based metadata similarity. Metadata are harvested from <a href="https://podaac.jpl.nasa.gov/ws">PO.DAAC web service APIs</a>. Metadata variable values are then converted to value using the united unit to calculate metadata content similarity. Elasticsearch is used to store all of the above similarities. Once a user views a metadata record, the system finds the top-k related metadata with a hybrid recommendation methodology. The hybrid recommendation module integrates results from content-based recommendation and session-based recommendation methods and ranks the final recommendation list in a descending order of similarity.</p>




<a href="/weekly/update/2018/04/23/ranking-algorithms.html"><h2>An introduction to MUDROD ranking algorithm</h2></a>
<p>Posted <b>2018-04-23</b> by <b>Lewis John McGibbney</b></p>
<p>When a user types some keywords into a search engine, there are typically hundreds, or even thousands of datasets related to the given query. Although high level of recall can be useful in some cases, the user is only interested in a much smaller subset. Current search engines in most geospatial data portals tend to induce end users to focus on one single data characteristic/feature dimension (e.g., spatial resolution), which often results in less than optimal user experience (Ghose, Ipeirotis, and Li 2012).</p>

<p>To overcome this fundamental ranking problem, we therefore 1) identify a number of ranking features of geospatial data to represent users’ multidimensional preferences by considering semantics, user behaviour, spatial similarity, and static dataset metadata attributes; 2) apply machine learning method to automatically learn a function from a training set capable of ranking geospatial data according to the ranking features.</p>

<p>Within the ranking process, each query will be associated with a set of data, and each data can be represented as a feature vector. Eleven features listed below are identified by considering user behaviour, query-text match and  examining common geospatial metadata attributes.</p>

<table>
  <thead>
    <tr>
      <th>Query-dependent features</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Lucene relevance score</td>
    </tr>
    <tr>
      <td>Semantic popularity</td>
    </tr>
    <tr>
      <td>Spatial Similarity</td>
    </tr>
    <tr>
      <td> </td>
    </tr>
  </tbody>
</table>

<table>
  <thead>
    <tr>
      <th>Query-dependent features</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Release date</td>
    </tr>
    <tr>
      <td>Processing level</td>
    </tr>
    <tr>
      <td>Version number</td>
    </tr>
    <tr>
      <td>Spatial resolution</td>
    </tr>
    <tr>
      <td>Temporal resolution</td>
    </tr>
    <tr>
      <td>All-time popularity</td>
    </tr>
    <tr>
      <td>Monthly popularity</td>
    </tr>
    <tr>
      <td>User popularity</td>
    </tr>
    <tr>
      <td> </td>
    </tr>
  </tbody>
</table>

<p>RankSVM, one of the well-recognized learning approach is selected to learn feature weights to rank search results. In RankSVM (Joachims 2002), ranking is transformed into a pairwise classification task in which a classifier is trained to predict the ranking order of data pairs.</p>

<center>
	<img src="/images/ranking.png" />
	Figure 1. System workflow and architecture
</center>

<p>The proposed architecture primarily consists of six components comprising semantic knowledge base, geocoding service, search index, feature extractor, learning algorithm, and ranking model respectively (Figure 1). When a user submits a query, it is then converted into a semantic query and a geographical bounding box by the semantic knowledge base and geocoding service. The search index would then return the top K results for the semantic query combined with the bounding box. After that, feature extractor would extract the ranking features for each of the search results, including the semantic click score. Once all the features are prepared, the top K results would then be put into a pre-trained ranking model, which would finally re-rank the top K retrieval. As the index in this architecture can be any Lucene-based software, it enables the loosely coupled software structure of a data portal and avoids the cost of replacing the existing system.</p>

<p>Reference:</p>
<ul>
  <li>
    <p>Ghose, Anindya, Panagiotis G Ipeirotis, and Beibei Li. 2012. “Designing ranking systems for hotels on travel search engines by mining user-generated and crowdsourced content.”  Marketing Science 31 (3):493-520.</p>
  </li>
  <li>
    <p>Joachims, Thorsten. 2002. Optimizing search engines using clickthrough data. Paper presented at the Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining.</p>
  </li>
</ul>




      <!-- footer -->
      <nav class="navbar navbar-default">
        <div class="navbar-header">
          <a class="navbar-brand" href="">SDAP</a>
        </div>
        <div class="navbar-text pull-right">&copy; 2017-2023 The Apache Software Foundation. Licensed under <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License 2.0</a>. <a href="https://privacy.apache.org/policies/privacy-policy-public.html">Privacy Policy</a><br/>
        Apache SDAP, SDAP, Apache, the Apache feather logo, and the Apache SDAP project logo are trademarks of The Apache Software Foundation.</div>
          <div align="center">
              <a href="https://incubator.apache.org/">
                <img src="https://www.apache.org/logos/res/incubator/incubator.png" alt="Apache Incubator" width="250" />
              </a>
             </img>
          </div>
        <div class="navbar-text pull-right">Apache SDAP is an effort undergoing <a href="https://incubator.apache.org/">Incubation</a> at The Apache Software Foundation (ASF), sponsored by the Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.</div>
      </nav>

      <script src="/js/jquery.min.js"></script>
      <script src="/js/bootstrap.min.js"></script>
    </div>
  </body>
</html>

